Fully coherent follow-up of continuous gravitational-wave candidates 



M. Shaltev, R. Prix 
Albert- Einstein- Institut, Callinstr. 38, 30167 Hannover, Germany 
(Dated: Mon Mar 11 10:10:31 2013 +0100) 
(LIGO-P1200185-v2) 
(MmitID: d849404-CLEAN) 

The search for continuous gravitational waves from unknown isolated sources is computationally 
limited due to the enormous parameter space that needs to be covered and the weakness of the 
expected signals. Therefore semi-coherent search strategies have been developed and applied in 
distributed computing environments such as EinsteinQHome, in order to narrow down the parameter 
space and identify interesting candidates. However, in order to optimally confirm or dismiss a 
candidate as a possible gravitational-wave signal, a fully-coherent follow-up using all the available 
data is required. 

We present a general method and implementation of a direct (2-stage) transition to a fully- 
coherent follow-up on semi-coherent candidates. This method is based on a grid-less Mesh Adaptive 
Direct Search (MADS) algorithm using the ^-statistic. We demonstrate the detection power and 
computing cost of this follow-up procedure using extensive Monte-Carlo simulations on (simulated) 
semi-coherent candidates from a directed as well as from an all-sky search setup. 



I. INTRODUCTION 

Continuous gravitational waves (CWs) are expected to 
be emitted from rapidly spinning non-axisymmetric com- 
pact objects, e.g. neutron stars. The computational cost 
of a coherent matched-filtering detection statistic, such 
as the /"-statistic [I], is small provided the parameters 
of the source (i.e. sky position a, 5, frequency /, fre- 
quency derivatives /, . . . ) are known. However, wide 
parameter-space searches for unknown sources quickly 
become computationally prohibitive, due to the large 
number of points in parameter space (templates) that 
need to be searched [2J. 

In order to first reduce the parameter space to smaller, 
more promising regions, semi-coherent search techniques 
have been developed [3HB] and are currently being used 
[S], for example in the Einstein@Home distributed 
computing environment |9| . In a semi-coherent search the 
total amount of data T is divided into N shorter segments 
of duration AT. The coherent statistics from the individ- 
ual segments are combined to a new semi-coherent statis- 
tic. At fixed computing cost these semi-coherent methods 
are (typically) more sensitive than fully-coherent searches 

Structuring a wide parameter-space search into hier- 
archical stages, which increasingly concentrate computa- 
tional power onto the more promising regions of param- 
eter space, was first described in [2J and elaborated fur- 
ther in [3], where a two-stage semi-coherent hierarchical 
search was considered. An extended hierarchical scheme 
with an arbitrary number of semi-coherent stages and 
a final fully-coherent stage was studied numerically in 
[4], which concluded that three semi-coherent stages will 
typically be a good choice. In [TT] and [T2J the use of an 
optimization procedure has been considered in the pro- 
cess of estimation of the source parameters, once a candi- 
date is considered as a detection. In both cases, however, 
no practical method or implementation was provided for 



the systematic coherent follow-up of semi-coherent can- 
didates. 

The aim of the present work is to introduce such a 
coherent follow-up search strategy and implementation. 
This is achieved by exploring the parameter space around 
a semi-coherent candidate using a Mesh Adaptive Direct 
Search (MADS) algorithm. Using this method, we find 
that a fully-coherent follow-up (using all of the available 
data) of initial semi-coherent candidates is computation- 
ally feasible. 

This paper is organized as follows: in Section [TT] we 
describe the relevant basic concepts in CW searches, in 
Section III we propose a search strategy for the system- 



atic follow-up of CW candidates, in Scction |TV] we present 
a Monte-Carlo study and in Section |V] we discuss the re- 
sults. 



Notation 

We distinguish a quantity Q when referring to a fully- 
coherent stage using a tilde, Q and when referring to a 
semi-coherent stage using an overhat, Q. Averaging over 
segments is denoted by an overbar, Q. 



II. CONTINUOUS GRAVITATIONAL WAVES 

Continuous gravitational-wave signals are quasi- 
monochromatic and sinusoidal in the source frame and 
undergo phase- and amplitude-modulation due to the di- 
urnal and orbital motion of the detectors. The phase 
evolution of the signal at a detector can be approximated 
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where $o is the initial phase, / < - fc - ) = are 
derivatives of the signal frequency / at the solar system 
barycenter (SSB) at reference time to, c is the speed of 
light, r(t) is the vector pointing from the SSB to the de- 
tector and n is the unit vector pointing from the SSB to 
the gravitational-wave source. 



A. Detection statistic 

Following [TJ [T3] the gravitational-wave response of a 
detector can be expressed as a sum over four (detector- 
independent) amplitude parameters multiplying four 
(detector-dependent) basis waveforms. The amplitude 
parameters can be analytically maximized over and the 
resulting detection statistic, known as the ^-statistic, is 
therefore a function only of the template "phase param- 
eters" A = {a, S, f, /, ...}, where a (right ascension) and 
S (declination) denote the sky position of the source. 

In the presence of a signal the fully-coherent detection 
statistic 2T follows a non-central ^-distribution with 4 
degrees of freedom and a non-centrality parameter given 
by the squared signal to noise ratio (SNR), p 2 . The ex- 
pectation value is therefore 



with variance 



E[2J=] = 4 - 



cr 2 [2J"] = 2(4 + 2p 2 ) 
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On the other hand, in the semi-coherent approach we 
divide the available data into N segments of duration 
AT and combine the individual coherent statistics of the 
segments to compute a semi-coherent statistic, namely 
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where 2Tk is the coherent ^-statistic in segment k. The 
quantity N 2T follows a non-central ^-distribution with 
47V degrees of freedom, thus the expectation value of 2T 
is 



with variance 



E[2T] = 4 + p 2 



a 2 [2F} = -(A + 2p 2 ) 
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where p 2 is the average SNR 2 over all segments, i.e. 

N 
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and p\ denotes the SNR 2 in segment k. 



B. Mismatch and Fisher matrix 

A search for sources with unknown signal parame- 
ters implies a loss of detection power compared to the 
perfectly-matched case. To quantify this we use the no- 
tion of mismatch p, as first introduced in [TUfTS]. This is 
defined as the fractional loss of expected SNR 2 at some 
parameter space point A compared to the expectation 
p 2 {\ s ) at the signal location \ s , namely 



_p 2 (A s )-p 2 (A) 
P 2 (A.) 
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such that p £ [0,1]. Taylor expansion in small offsets 
AA = A — X s around the signal location yields 



p ee g l3 {\ s ) AX l AX J + C(AA 3 ) 
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where implicit summation over repeated parameter-space 
indices i, j applies, and the symmetric positive-definite 
matrix is commonly referred to as the parameter- 
space metric. 

Neglecting higher order terms, one often uses the "met- 
ric mismatch approximation" , namely 



M* = 9ij(^s) AA^AA^ 



(10) 



as a distance measure, with a range p* € [0, oo). This 
metric mismatch p* plays an important role in grid-based 
searches, where one typically constructs template banks 
in such way that the mismatch of any putative signal 
and the "closest" template is bounded by a maximal mis- 
match m, i.e. 



p < m 



(11) 



everywhere in the template bank. 

In the presence of noise, p as defined in Eq. Q is not 
directly accessible, and we therefore introduce a related 
quantity, namely the fractional loss of measured SNR 2 , 
namely 



2J"(A s )-4 



(12) 



Note that p < 1, but contrary to ([8]) it can also be 
(slightly) negative, as we can have 2J\X S ) < 2J r (A) due 
to noise. 

For semi-coherent searches the metric is found [3] as 
the average of the fully-coherent metrics over all the seg- 
ments, namely 
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where g%j,k is the coherent metric ^ in segment k. 

A standard tool for parameter estimation is provided 
by the Fisher information matrix, which characterizes 
the statistical uncertainty of the maximum-likelihood es- 
timators (MLE) A^ LE for the signal parameters A* . This 
can be formulated [TMT5] as the well-known Cramer-Rao 
lower bound on the variance of an unbiased MLE (i.e. 
e [*mle\ = K), namely 



(14) 



where the matrix jr~ 1 } i "' denotes the inverse of the 
Fisher matrix Fy, which is closely related (e.g. [T7]) to 
the metric gij, namely 



Vij = P 



(15) 



A semi-coherent search over N segments can be consid- 
ered as N different measurements, thus the semi-coherent 
Fisher matrix yields [19] 
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Assuming constant SNR for the different segments we 
can rewrite ( 16 ) in terms of the semi-coherent metric 



( 13 1, namely 
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and thus 
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where (f- 7 is the inverse matrix of gij . 



C. Computing cost 

The computing cost C of a fully-coherent (or an ideal 
semi-coherent |10p search is primarily due to the com- 
putation of the ^-statistic over all the templates. For a 
search over Af templates using N segments of data from 
iVdet detectors (TUJ, the computing cost C is 



C = NNN det ci 



(19) 



where c\ is the implementation-dependent comput- 
ing cost for a single template, segment and detector. 
A method of /"-statistic computation based on short 
Fourier transforms (SFTs) [2U] of length Tsft is currently 
widely used in CW searches and will be considered in the 
present work. The cost per template in this case is pro- 
portional to the segment duration, namely 
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where Cq FT is implementation- and hardware-dependent 
fundamental computing cost per SFT. Using the total 
number of SFTs 
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we can write the total computing cost (191 of the SFT- 
method as 



C — Af Asft eg 



SFT 
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In grid-based searches the number of templates re- 
quired to cover the search parameter space P is given 
by the general expression [22] 



Af: 
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(23) 



where 9 is the normalized lattice thickness, n is the num- 
ber of search dimensions, m is the maximal template- 



bank mismatch (111 and detg is the determinant of the 
parameter space metric ([£]). The normalized thickness is 
a constant depending on the grid structure, e.g. for a 
hyper-cubic lattice 6z n = n n l 2c 2r n . The metric g^ de- 
pends strongly on the duration AT and the number of 
segments N, in such a way that longer observation times 
typically require a (vastly) increased number of templates 

la- 



in. COHERENT FOLLOW-UP OF 
SEMI-COHERENT CANDIDATES 

A. Basic two-stage search strategy 

Here we introduce a simple two-stage strategy for 
following-up candidates from semi-coherent searches. In 
the first stage, called refinement, we employ a finer search 
using the semi-coherent statistic 2T to improve the ini- 
tial maximum-likelihood estimator. In the second stage, 
called zoom, we apply the fully-coherent statistic 2JF us- 
ing all the data T, in order to test whether the candidate 
is inconsistent with Gaussian noise and if it further agrees 
with the signal model. 

The motivation for this 2-stage approach can be seen 
from an example 2-D search grid shown in Fig. [l] The 
search templates are generally placed such that a pu- 
tative signal A s will be recovered with a loss of SNR 



bounded by a maximal mismatch m, as given in Eq. (Ill, 
namely 



9ij 



AA*AA J < m 



(24) 



where equality defines an (n-dimensional) iso-mismatch 
ellipse. The initial semi-coherent search will yield "can- 
didates" A c for which the statistic 2J 7 exceeds a certain 
threshold and is higher than neighboring templates. 

The initial refinement stage of our follow-up strat- 
egy therefore consists in finding the (nearby) parameter- 
space point Amle of the actual (local) maximum in the 
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semi-coherent isomismatch ellipse grid point 



A, « A 




MLE 



semi-coherent Fisher ellipse 



FIG. 1: 2-D search grid in {/, /} space. The black dots are 
the search templates, placed such that the loss of SNR on any 
putative signal X s will be bounded by a maximal mismatch m, 
which defines the semi-coherent iso-mismatch ellipses. The 
semi-coherent Fisher ellipse centered on the MLE Amle is 
used to constrain the zoom parameter space. The aim of the 
zoom stage is to find Amle 



statistic 2J r (A) (which is smooth function of A), referred 
to as the maximum-likelihood estimator (MLE) . This can 
be achieved simply by a denser placement of templates 
using the original statistic, i.e. by keeping the search 
setup unchanged in terms of the number and length of 
segments. 

In the zoom stage we fully-coherently search the Fisher 
ellipse centered on the semi-coherent MLE Amle- This 
defines the parameter-space region that should contain 
the signal location A s with confidence corresponding to 
71b standard deviations, i.e. 




Amle 

ellipse bounding box 



semi-coherent Fisher ellipse 



FIG. 2: 2-D example: Fisher ellipse (25 1 defining the zoom 



search space, centered on the semi-coherent MLE Amle- The 
extents {A/, A/} of the bounding box are given by Eq. (27 1. 



r tf 5X l 6X j < nl , 



(25) 



in the refinement stage and a subscript Z for quantities 
in the zoom stage. 

The search volume for the refinement stage depends 
on the template bank construction of the original semi- 
coherent search. Ideally one iso-mismatch ellipse corre- 
sponding to the original template-bank construction (see 
Fig. [I]) should be sufficient. In practice, however, it might 
often be neccessary to use several grid spacings in each 
direction, if the template bank was not originally con- 
structed in a strictly metric way. In this case the exact 
number of grid spacings will have to be empirically de- 
termined in a Monte Carlo study. 



Bounding box and volume of n- dimensional ellipses 



where SX l = A MLE — A* . Note that the Fisher ellipse actu- 
ally describes the fluctuations of the maximum-likelihood 
estimator Amle for given signal location. However, pro- 
vided the likelihood-manifold is not strongly curved, this 
also describes our uncertainty of the signal location for 
given MLE Amle, as indicated in Fig. [2} The zoom stage 
will yield the fully-coherent maximum-likelihood estima- 
tor Amle, which represents our best estimate for the sig- 
nal parameters X s . Thus the two-stage search strategy 
corresponds to the transition 



_ refinement ^ 

X c * Amle " 



In the following it will be useful to express the bound- 
ing box and volume of an n-dimensional ellipse, namely 
for the iso-mismatch ellipse of Eq. ( 24 1 and the Fisher el- 
lipse of Eq. (25 1. The general form of the n-dimensional 



ellipse equation is 



dj dX i dX j = R 2 



(26) 
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where Gij is a positive-definite symmetric matrix. The 
extents AA* of a bounding box along coordinate axes X 1 
(as indicated in Fig. [2]) can be obtained from the diagonal 
elements of the inverse matrix, {G~~ namely 



In the following we use a subscript R to denote quantities 



AA' = 2R v^G- 1 } 11 - 



(27) 
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The ellipse coordinate volume is expressible via the ma- 
trix determinant, det G, namely 



V 



R n 



=V„, 



where V„ = 



i/2 



r(l+n/2) 



\fdetG 
is the volume of unit n-ball. 



(28) 



B. Classification of zoom outcomes 

Assuming a real CW signal, we can estimate the range 
of expected values of the fully-coherent zoom ^-statistic 
in A s . From Eq. |5| we can obtain a (rough) esti- 
mate of the average-SNR 2 from the measured average- 
SNR 2 of the semi-coherent maximum-likelihood estima- 
tor, namely 



n 2 

P MLE 



2J~MLE — 4 



(29) 



The SNR of the fully-coherent search is linear in the 
number of segments N, i.e. 

P 2 = AVmle • (30) 

Substitution of the above expression in Eq. ^ yields the 
expectation for the fully-coherent matched filter in Amle, 
namely 



2JF Q ee E[2T] « 4 



N p 2 



MLE 



(31) 



Further substitution of Eq. ( 30 1 in Eq. ^ yields the cor- 
responding variance as 



a 2 [2 J] 



2(4 + 2AV MLE ) 



(32) 



These quantities are useful for defining what we mean by 
confirming a CW signal. 

Note that the uncertainty in the original SNR- 
estimation in Eq. ( 29 1 results in a distribution around the 



final estimate of Eq. ( 31 ) that is wider than estimated by 



Eq. (32 1. This effect can be computed analytically and 



empirically, and is found to amount to about a factor of 
2. _ 

Depending on the maximal 2J- value found in the final 
zoom stage, we can distinguish 3 possible outcomes: 

• Consistency with Gaussian noise (G) - the fully- 
coherent 2T value does not exceed a threshold 



— ~(G) 



(33) 



JG) 



where 2J r th is chosen to corresponds to some 
(small) false-alarm probability ptA in Gaussian 
noise. 

For example, a threshold 2J r th = 60 corresponds 
to a very small false-alarm probability of order 
10~ 12 in a single template, as given by Eq. (43 1. 



• Non-Gaussian origin (—>G) - the candidate is loud 
enough to be inconsistent with Gaussian noise at 
the false-alarm probability pfA, i.e. 



XF > 2T. 



(G) 
Hi 



(34) 



We define signal recovery (S) as a subclass of ->G, 
namely if the final zoomed candidate 2T exceeds 

~(G) 

the Gaussian-noise threshold 2J r th and falls into 



the predicted signal interval given by Eqs. (31 1 and 
( |32| (at some confidence level). We can write this 
as 
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~(S) ~ 
2T = 2F 

£ '- r max — o 
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2T Q 



(35) 
,a a }, and 



n u 0~ o , where n u determines the 
desired confidence level. In this work we consider 
n u = 6, which corresponds roughly to a confidence 
of - 99.6%. 

Note that there can be cases where a zoomed candi- 
date ends up in -^G but does not make it into the signal 

~(G) ~ ~(S) 
recovery (5) band, e.g. typically 2T th < 2T < 2T th . 

There can be different reasons for this, e.g. the search 
algorithm converged to a secondary maximum in the re- 
finement or zoom stage, the signal model deviates from 
reality and requires modification, or the "signal" found is 
of non astrophysical origin (eg a detector-noise artifact). 
Generally further investigation will be required for all 
candidates falling into the non-Gaussian category (^G). 



C. Grid-based computing-cost of the zoom stage 

We do not consider a grid-based follow-up method in 
this paper, but it is instructive to estimate the corre- 
sponding computing-cost for later comparison. To es- 
timate the number of templates required for the fully- 
coherent search, we can use Eq. ( 28 1 to compute the 



volume of the follow-up Fisher ellipse, Eq. (25 1, and di 



vide it by the volume covered by one coherent template, 
Eq. ( 24 ) . Namely, the Fisher-ellipse volume is given by 



(36) 



while the coherent template-volume at mismatch m is 

m n/2 



V 



Vn 



y/detg 

therefore we can estimate then number of template as 



(37) 
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Consider a follow-up of a candidate from a directed n = 2 
search in {/, /} (e.g. see Fig. [lj. Assuming a semi- 
coherent search using N = 200 segments of AT = 1 d 
duration without gaps, and a fully-coherent observation 
time of T = 200 d. Using the expressions found in [2"3"] . 
the determinants of the two-dimensional coherent and 
the semi-coherent metrics are found as 



^detg = n 2 T 3 
\J det <? = 7r 2 AT 
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3 7(g) 
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(39) 
(40) 



where 7 ~ \fhN is the spindown refinement factor. 
Putting everything together in Eq. ( 38 1 , we obtain 



N ~ B 



\/5 p 2 TO 



(41) 
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where we used N = T/AT. For a signal with p* 
1, Ub = 24 1 and m = 0.1 the number of te mpl ates 
is therefore JV « 5.1 x 10 5 . Thus, using Eq. (22) for 
2 detectors and the SFT method with T SF t = 1800 s, 
the zoom computing cost is C ~ 11 min per candidate, 
where we used the fundamental computing cost Cq FT = 
7 x 10" 8 s[I0]. 

In the more general case where the sky position of 
the source is also unknown, the number of sky points 
typically scales at least quadratically with the obser- 
vation time [53] (for coherent integration longer 
than few days), thus generally resulting in completely 
prohibitive computational requirements for grid-based 
follow-up searches. In particular, extending the directed 
search example from the previous paragraph to an all-sky 
follow-up would require A4ky ~ 1.3 x 10 6 sky points 2 , or 
a total of m 6.8 x 10 11 templates. 

For comparison, using the grid-less search algorithm 
discussed in the next sections, it is possible to coherently 
follow up 2-D directed candidates in less than 2 minutes, 
see Fig. |4d| and all-sky candidates in about 1 hour per 



candidate, see Fig. 5d 



D. Mesh Adaptive Direct Search (MADS) 

A significant difference between the hierarchical search 
strategies discussed in pHl] and this work is the method 
of template bank construction at the different stages. 
Namely, we consider a grid-less method for exploring the 
parameter space. 



1 This large «b value is found to contain the signal location in 
more than 98 % of the cases even for weak signals, where the 
Fisher-matrix may be a poor predictor, see [16] . 

2 The number of sky templates has been estimated by numeri- 
cal computation of the sky part of the metrics g and g using 
FstatMetric_v2 from LALSUITE [25], see also [17]. 



The MADS class of algorithms for derivative-free op- 
timization has been first introduced in [26j and further 
developed in [37] and [35] among others. In this subsec- 
tion we only introduce some of the control parameters 
of the algorithm required in the construction of MADS- 
based .F-statistic searches, for an in-depth treatment and 
proofs we refer the reader to the cited publications. 

MADS consist of the iteration of two steps, called 
search and poll, in which trial points are constructed and 
evaluated in order to find an extremum. In the search 
step any strategy can be applied to construct trial points. 
In this work we use quadratic models (quadratic form) to 
approximate the objective function from a sample of ob- 
jective values [28]. If the local exploration in the search 
step fails to generate a new solution, a set of poll points 
is generated using a stochastic or deterministic method. 
Stochastic means that the poll points are generated ran- 
domly 20 . where deterministic refers to the usage of 
pseudo-random Halton sequences [57] . However both 
methods generate points which form a dense set in the 
unit sphere after an infinite number of iterations. For a 
given starting point A c with parameter space boundaries 
AAb, initial step sizes dX and a method for generation 
of poll points, the discretization of the parameter space 
A™ at iteration k is governed by a fixed rational number 
> 1 and the coarsening w + > and refining w~ < — 1 
exponents. If the current iteration generates a better so- 
lution, the discretization in the next iteration is coarser, 
namely A^ +1 = < + A^ n , otherwise A^ +1 = u%~ A% [26]. 
The algorithm stops if an improved solution cannot be 
found or the total number of evaluated parameter space 
points p reaches some given maximum p max - 



E. MADS-based follow-up algorithm 

From the point of view of the MADS algorithm, the 
function to optimize is a black-box requiring some input 
to produce a single output value. The black-box in our 
case is either the computation of the semi-coherent T- 
statistic IT of Eq. |4j) in the refinement, or the fully- 
coherent F-statistic IT in the zoom stage. In order 
to minimize the possibility of convergence to secondary 
maxima, we run multiple instances of the MADS search 
in each stage varying the mesh coarsening exponent w + . 
The minimal and maximal wi, av coarsening ex- 

min nidx <— > 

ponent determine the number of MADS steps in each 
pass, namely n s tc P s = w+ ax - m+ jn + 1. Thus we con- 
sider our search algorithm composed of several instances 
of MADS, see Fig. [3] The input of the search algo- 
rithm is the candidate A c to follow-up, the search bound- 
aries AXr/z around the candidate and a set of MADS 



input parameters, namely {dX, lib, w+ in , w~ } . In 

the zoom stage the search boundaries (AXz) are esti- 
mated from the bounding box of the Fisher ellipse, using 
Eq. (27). For the refinement stage the search boundaries 
(AAfl) generally have to be determined depending on the 
template-bank setup of the original semi-coherent search. 
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Ao, AAb, dX, Ub, 


W min> ™max, W 







1st pass - deterministic 



"-steps searches with 
starting point Ao 



2nd pass 


- stochastic 


"stops searches with 


starting 


point Ao 


/ 





3th pass - deterministic 



n stc 



searches with start- 



ing point the loudest point 
of the 1st and 2nd pass 



4th pass - stochastic 



"steps searches with a ran- 
dom starting point from 
the vicinity of the loudest 
point of the 1st and 2nd pass 



Output 



the loudest of all searched points 



FIG. 3: MADS-based search algorithm with 4 passes, where 
Ao = A c in the refinement stage and Ao = Amle in the zoom 
stage. 



Note, however, that the bounding boxes AA only serve 
as a necessary input parameter to the NOMAD search 
algorithm, while the effective MADS search region can 
be further reduced by rejecting points that do not sat- 
isfy a given constraint. For example, the effective search 
region in the zoom stage always consists of the Fisher 
ellipse Eq. ([25]). 

The initial step sizes d\ l are also empirically deter- 



mined, typically as some fraction of the search boundary 

We propose a 4 pass algorithm with equal (for simplic- 
ity) number of steps ?T. s tcps in each pass, however with 
different starting point and method of trial-point gener- 
ation: 

• 1st pass - starting point A c , deterministic point 
generation, 



• 2nd pass - starting point A c , stochastic point gen- 
eration, 

• 3rd pass - starting point loudest template from the 
first 2 passes, deterministic point generation, 

• 4th pass - starting point from the vicinity of the 
loudest point from the first 2 passes, stochastic 
point generation. 

In the zoom stage we terminate the search as soon as 
the loudest point of the current iteration satisfies the 
signal-confirmation condition (S) of Eq. (351. In lower- 
dimensional cases, such as the directed search considered 
later, a single pass is therefore often found to be suffi- 
cient. For later usage we introduce the total number of 
MADS iterations nj as the sum of the number of steps 
in each pass. 



F. MADS-followup computing cost 

Contrary to grid-based searches, the computing cost 
of the MADS based algorithm is non-deterministic, due 
to the a-priori unknown number of explored parameter 
space points. To estimate the maximal computing cost of 
the refinement or the zoom stage using Eq. ( 19 ), we need 



the maximal number of possibly evaluated templates 



AC 



i=0 



(42) 



where p l max is the user-specified maximum of the number 
of computed templates at MADS iteration i. This max- 
imal number is typically chosen large to avoid too early 
interruption of the MADS instance, e.g. when further 
improvement of the current solution is possible while the 
extremum is not yet found. However, if the extremum is 
found, a MADS iteration starting from this point termi- 
nates rapidly. 

Note that the fundamental computing cost Cg FT in 
stochastic searches over the sky is typically larger than 
in a grid-based search, where a lot of templates with dif- 
ferent spindown components can be computed at fixed 
sky position. This results in a larger value of about 
Cq FT m 3 x 1CT 7 s instead of the number quoted in 

sec. inn3 



G. False-alarm and detection probability 

After the final fully-coherent zoom stage we are left 
with a candidate falling in one of the three categories 
discussed earlier, namely: the candidate is consistent 
with the signal model (S), with Gaussian noise (G) or 
is of non- Gaussian origin but inconsistent with the sig- 
nal model. An additional valuable piece of information is 
the false-alarm probability associated with the candidate. 
This is the probability of exceeding a threshold 2J-- value 
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in the absence of a signal, where the relevant distribu- 
tion is the central \ 2 distribution with 4 degrees of free- 
dom, denoted as x^{2JF). The single-template false-alarm 
probability is 



PiA = I d{2F) X \{2F) 

= (1 + J- th )e^ th , 
and for J\f independent templates this results in 



(43) 



(44) 



where for Np\ A <C 1, Taylor expansion yields j»fA ~ 

~(G) 

JvP[ A - For example, a threshold of 2J r th = 70 for a 
search with A/" = 1 x 10 5 templates corresponds to a false- 
alarm probability of pia < 2 x 10~ 9 , where the upper 
bound corresponds to J\f completely independent tem- 
plates. 

The overall detection probability of the follow-up 
method depends on the signal SNR. Higher SNR in the 
refinement stage yields better localization of the signal, 
i.e. a smaller Fisher ellipse and thus also a higher prob- 
ability of signal recovery (35). In addition, the MADS- 



algorithm parameters also affect the detection efficiency, 
e.g. an increased number of MADS iterations increases 
the detection probability especially for signals with lower 
SNR. Because of this, the detection probability will have 
to be estimated empirically in a Monte Carlo study, see 
Figs. [4c] and [5c] 



IV. MONTE CARLO STUDIES 

To demonstrate the capability of the systematic follow- 
up procedure proposed in Section |III| we perform two 
different types of Monte Carlo (MC) studies. 

In the first case we simulate a so-called directed search 
for a fixed sky position, where we follow up candidates 
in a 2-dimensional spindown space, i.e. {/,/}■ In the 
second case we simulate an all-sky search over the 4- 
dimensional parameter space {a, <5, /, /}. 

All MADS searches are implemented using the MADS 
reference library NOMAD and the LAL library from 
the LALSUITE [25] is used for the .F-statistic compu- 
tation [30] . The Gaussian data and signal injections 
are produced using the LALAPPS programs from LAL- 
SUITE. In particular with lalapps_Makef akedata_v4 we 
create data sets of total duration T — 200 d, with 
N = 200 segments of duration AT = 1 d, using SFTs 
of length T SFT = 1800 s, for the two LIGO detectors 
HI and LI. The noise level per detector is generated as 
Gaussian white noise with a power-spectral density S n of 

^s; l = 2x io-^hz- 1 / 2 . 

Independently of the type of the search, the initial can- 
didates to follow-up are prepared as follows. Rather than 
performing a semi-coherent grid based search using the 
Hough- [5 or GCT-method [5J, we generate candidates 



by drawing a random point in the vicinity of the injection 
and consider it a candidate if the semi-coherent metric 
mismatch fi* is within the range 



L 1 * € [0, 1] 



(45) 



see Figs. 4a and 5a This procedure for candidate 
preparation allows us to separate the study of the follow- 
up algorithm from the problem of how to setup a semi- 
coherent search, which is a difficult question on its own. 

Note that even if the original grid-based semi-coherent 
search does not produce candidates that conform with 
Eq. ( 45 1 , we can always increase the density of the grid 



until (45) applies. This would amount to a (cheap) pre- 
processing stage inserted before the present follow-up 
procedure . 



A. Follow-up of candidates from a directed search 

For the directed type of searches we fix the sky position 
to the coordinates of the Galactic Center. This choice is 
arbitrary and we could use any other point without qual- 
itatively changing the results. We create 5000 data sets. 
Note that each data set has different Gaussian noise real- 
ization in which a CW signal from an isolated source is in- 
jected. In the process of injection, the original noise data 
set is also used to examine the behavior of the follow-up 
method in the absence of a signal. 

The pulsar injection parameters X s are drawn uni- 
formly in the range / G (50,51) Hz, cosi G (—1,1), 
rp G (—7t/4, 7r/4) and fa G (0, 2n). The signal ampli- 
tude ho is chosen such that the expected average -SNR 2 of 
Eq. Q f or a perfect match is distributed uniformly in the 
range p 2 s G (0, 2). The spindown / is chosen uniformly in 
the range / G (— iaa, with minimal spindown age 
r min = 300 yr at T fZa ="50 Hz. The MADS-algorithm 
parameters used in the MC are summarized in Table [TJ 
which have been found empirically to achieve good re- 
sults. For this type of follow-up we find that the 1st 



stage 


w 


W min 


"'max 


u b 


ym ax 


R 


-1 


1 


l 


2 


20000 


Z 


-1 


1 


50 


1.1 


20000 



TABLE I: Algorithm parameters for follow-up of candidates 
from directed searches. 

pass of the search algorithm in the refinement stage and 
only two repetitions of the 2nd pass in the zoom stage 
is sufficient. We restrict the size of the search box for 
the refinement stage A\r by taking 1 frequency and 2 
first spindown metric extents. In the zoom stage we con- 
strain the parameter space to a Fisher ellipse Eq. (25) 
with riB — 24. 

We first apply the follow-up chain to the pure 
Gaussian-noise data without injected signals. The corre- 
sponding 2J- z distribution of the resulting fully- coherent 
zoom stage is plotted in Fig. 4b The maximal value 
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(a) SNR loss of the initial candidates fi versus semi-coherent 
metric mismatch fi* to the closest template. 




(b) IT z distribution after the fully-coherent 2-D {/, /} zoom 
stage of 5000 directed searches in pure Gaussian noise without 
injected signal. The maximal 2J r -value found is 



2T 7 



51.61. The mean value (2T z) 
a dotted line. 



29.00 is plotted with 
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(c) Percentage of the 5000 injected signals classified as 
recovered (— 5) and of non-Gaussian origin fx —<G) as 
function of the non-centrality parameter p|, Eq. 171. The error 
bars are computed by using a Jackknife estimator. 
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(d) Upper plot: computing cost of the semi-coherent 
refinement stage. Middle plot: computing cost of the 
fully-coherent zoom stage. Lower plot: total computing cost. 



FIG. 4: Monte Carlo study of 2-stage follow-up of candidates from a directed {/, /} semi-coherent search pointed toward the 
Galactic Center with N = 200 segments of duration AT = 1 d. 



found is 2J- z = 51.61. We therefore use a threshold 
for the classification of non-Gaussian candidates (->G) of 

-(G) 



2F. 



th 



60, which is safely above this level. 



We next apply the follow-up chain to the Gaussian- 
noise data with injected signal. In Fig. [4c] we plot the 
percentage of injected signals that are classified as recov- 
ered signals (S) and non-Gaussian origin (~>G), as a func- 
tion of the injected signal strength p 2 s . From this plot we 
can read out the detection probability, namely we reach 
90% of signal recovery for candidates with p 2 s w 0.7. 

The computing cost as function of p 2 s is plotted in 



negligible and in the zoom stage the averaged computing 
time decreases with higher signal strength. 



B. Follow-up of candidates from an all-sky search 

The data and signal preparation for the following all- 
sky Monte Carlo study is the same as in the directed 
search case, however the sky position is drawn isotropi- 
cally over the whole sky. We create 7500 data sets with 
uniformly distributed injected average -SNR 2 in the range 



Fig. 4d We notice that the cost of the refinement stage is p 2 . € (0, 3). The algorithm parameters used in the refine- 
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(a) SNR loss of the initial candidates fi versus semi-coherent 
metric mismatch fi* . 




(b) 2J- ' z distribution after the fully-coherent 4-D 
{a, 8, /, /} zoom stage of 7500 searches in pure Gaussian 
noise, without injected signal. The maximal 2J r -value 



found is 2? '» 



: 58.76. The mean value is (2T z ) = 37.50 
indicated with dots. 
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(c) Percentage of the 7500 injected signals classified as 
recovered (— S) and of non-Gaussian origin ( ~<G) as 



15 
j[10 
C? 5 


4 

„3 

o 



0.0 



0.0 



1.0 



2.0 



1.0 



1.5 



2.0 



2.5 



3.0 



3.0 




(d) Upper plot: computing cost of the semi-coherent 
refinement stage. Middle plot: computing cost of the 



function of the signal strength p\. The error bars are computed fully-coherent zoom stage. Lower plot: total computing cost, 
using a Jackknife estimator. 



FIG. 5: Monte Carlo study of 2-stage follow-up of candidates from an all-sky {a, 5, f, /} semi-coherent search with JV = 200 
segments of duration AT = Id. 



ment and zoom stage are given in Table [TTJ which have 
been found empirically to yield good performance. We 
also find that here the zoom stage benefits from perform- 
ing all 4 search passes shown in Fig. [3] The size of the 
search box for the refinement stage in the spindown sub- 
space has been defined exactly as in the directed search 
example. The sky subspace is constrained by using an 
m = 1 iso-mismatch ellipse. As in the previous example 
we use n-Q — 24 in Eq. ( 25 ) to determine the size of the 
Fisher ellipse. 

Similarly to the directed follow-up, we first test the 
pipeline using the Gaussian noise data without injections. 
The resulting distribution of final 2Fz values is plotted 



stage 
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"^raax 


Ub 
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-1 
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20000 
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-1 
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50 


1.2 


20000 



TABLE II: Follow-up algorithm parameters for full parameter 
space searches. 



in Fig. 



5b 



The maximal value found is 2J^ Z = 58.76, 



which is higher compared to the value found in the di- 
rected follow-up searches due to the increased number 
of evaluated templates. We therefore use a threshold 
for the classification of non-Gaussian candidates (->G) of 

~(G) 

2J r th = 70, which is safely above this level. 
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Next we search the data containing the injected sig- 
nals. In Fig. [5c] we plot the fraction of signals classified 
as recovered (S) and the percentage of MC trials found 
to be of non-Gaussian origin (-<G), as a function of the 
injected signal strength p 2 s In order to achieve 90% sig- 
nal recovery (S), we now need stronger signals, namely 
p 2 > 1.7. However, for p 2 1.5 we can already achieve 
90% "detection probability" in the sense of separating 
candidates from Gaussian noise (^G). This indicates 
that the zoom step sometimes converges on a secondary 
maximum. Given that any non-Gaussian (~>G) candi- 
dates after zoom will receive further scrutiny, it would be 
straightforward to further explore the parameter space 
around such candidates to localize a potential primary 
maximum. 

The computing cost as a function of p 2 is plotted in 
Fig. [5dj We notice that the total computing cost is dom- 
inated by the zoom stage and the averaged computing 
time is rather independent of the signal strength. 

V. DISCUSSION 

We have studied a two-stage scheme for the fully- 
coherent follow-up of semi-coherent candidates. The first 
stage, called refinement, aims to find the maximum- 
likelihood estimator of the initial semi-coherent candi- 
date. This allows us to better constrain the parameter 
space for the coherent zoom stage. The two-stage scheme 
is suitable for following-up candidates from all-sky or di- 



rected semi-coherent searches. The proposed grid-less 
optimization lowers the computing cost per candidate to 
acceptable levels. In Monte Carlo studies we tested the 
efficiency of the algorithm for directed and all-sky follow- 
up searches. 

In this paper we restricted the all-sky follow-up op- 
timization to 4 dimensions, namely sky, frequency and 
first spindown. Further work is required to extend the 
optimization to higher dimensions. A related attrac- 
tive direction for further development is the extension 
and application of the search algorithm for follow-up of 
CW candidates in binary systems, which is a challenging 
higher-dimensional problem. 

We also aim to extend the two-stage scheme pre- 
sented here by including intermediate semi-coherent 
zoom stages. This should allow to further reduce the 
computing cost and increase detection efficiency. 
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